{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 🎯 Build a CSPOT Model \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [executable notebook can be **downloaded here**](https://github.com/nirmallab/cspot/blob/main/docs/Tutorials/notebooks/BuildCSPOTModel.ipynb)  \n",
    "  \n",
    "**When following the tutorial, it is crucial to read the text as simply running the cells will not work!**\n",
    "  \n",
    "Please keep in mind that the [sample data](https://doi.org/10.7910/DVN/C45JWT) is used for demonstration purposes only and has been simplified and reduced in size. It is solely intended for educational purposes on how to execute `cspot` and will not yeild any meaningful results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Training a CSPOT Model involves the following steps:**\n",
    "- For any given marker: Identify a image that could be used to generate postive and negative thumbnails\n",
    "- Run the `generateThumbnails` function on the image to auto generate postive and negative thumbnails\n",
    "- Go through the auto generated  thumbnails and remove any wrong assignments\n",
    "- On the user sorted set of thumbnails run the `generateTrainTestSplit` function to prepare the data for training a deep learning model\n",
    "- Lastly, run `csTrain` "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import packages in jupyter notebook (not needed for command line interface users)\n",
    "import cspot as cs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**CSPOT auto generates subfolders and so always set a single folder as `projectDir` and cspot will use that for all subsequent steps.**  \n",
    "In this case we will set the downloaded sample data as our `projectDir`. My sample data is on my desktop as seen below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set the working directory & set paths to the example data\n",
    "projectDir = '/Users/aj/Documents/cspotExampleData'\n",
    "imagePath = projectDir + '/image/exampleImage.tif'\n",
    "spatialTablePath = projectDir + '/quantification/exampleSpatialTable.csv'\n",
    "markerChannelMapPath = projectDir + '/markers.csv'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-1: Generate Thumbnails for Training Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first step would be to train a model to recognize the marker of interest. In this example the data contains 11 channels `DNA1, ECAD, CD45, CD4, CD3D, CD8A, CD45R, KI67` and as we are not interested in training a model to recognize DNA or background (`DNA1`), we will only need to generate training data for  `ECAD1, CD45, CD4, CD3D, CD8A & KI67`. However for proof of concept, let us just train a model for `ECAD` and `CD3D`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To do so, the first step is to create examples of `postive` and `negative` examples for each marker of interest. To facilitate this process, we can use the `generateThumbnails` function in `CSPOT`. Under the hood the function auto identifies the cells that has high and low expression of the marker of interest and cuts out small thumbnails from the image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing Marker: ECAD\n",
      "Processing Marker: CD3D\n",
      "Thumbnails have been generated, head over to \"/Users/aj/Documents/cspotExampleData/CSPOT/Thumbnails\" to view results\n"
     ]
    }
   ],
   "source": [
    "cs.generateThumbnails ( spatialTablePath=spatialTablePath, \n",
    "                        imagePath=imagePath, \n",
    "                        markerChannelMapPath=markerChannelMapPath,\n",
    "                        markers=[\"ECAD\", \"CD3D\"], \n",
    "                        markerColumnName='marker',\n",
    "                        channelColumnName='channel',\n",
    "                        transformation=True, \n",
    "                        maxThumbnails=100, \n",
    "                        random_state=0,\n",
    "                        localNorm=True, \n",
    "                        globalNorm=False,\n",
    "                        x_coordinate='X_centroid', \n",
    "                        y_coordinate='Y_centroid',\n",
    "                        percentiles=[2, 12, 88, 98], \n",
    "                        windowSize=64,\n",
    "                        projectDir=projectDir)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Same function if the user wants to run it via Command Line Interface**\n",
    "```\n",
    "python generateThumbnails.py \\\n",
    "            --spatialTablePath /Users/aj/Documents/cspotExampleData/quantification/exampleSpatialTable.csv \\\n",
    "            --imagePath /Users/aj/Documents/cspotExampleData/image/exampleImage.tif \\\n",
    "            --markerChannelMapPath /Users/aj/Documents/cspotExampleData/markers.csv \\\n",
    "            --markers ECAD CD3D \\\n",
    "            --maxThumbnails 100 \\\n",
    "            --projectDir /Users/aj/Documents/cspotExampleData/\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The output from the above function will be stored under `CSPOT/Thumbnails/`.**  \n",
    "  \n",
    "There are a number of parameters that function need to provided as seen above. Detailed explanations are avaialable in the documentation. Briefly, the function takes in the single-cell table (`spatialTablePath`) with X and Y coordinates, the full image (`imagePath`) and lastly a list of `markers` for which thumbnails need to be generated. Please note as the program does not know which channels in the image corresponds to the `markers`, hence, the `markerChannelMapPath` is used to supply a `.csv` file that maps the channels to the marker information. The `markerChannelMap` follow 1-indexing convention- so the first channel is represented by the number `1`. \n",
    "  \n",
    "You would have also notices that I have set `maxThumbnails=100`. This basically means that even if more than 100 cells are identified, only 100 random cells will be used to generate the thumbnails. I generally generate about `2000` cells, however based on our estimates about 250 postive and 250 negative examples should be suffcient. As this is for illustration purpose only, I have set it to `100`.  \n",
    "  \n",
    "Now that the thumbnails are generated, one would manually go through the `TruePos` folder and `TrueNeg` folder and move files around as necessary. If there are any truly negative thumbnails in the `TruePos` folder, move it to `PosToNeg` folder. Similarly, if there are any truly positive thumbnails in `TrueNeg` folder, move it to `NegToPos` folder. You will often notice that imaging artifacts are captured in the `TruePos` folder and there will also likely be a number of true positives in the `TrueNeg` folder as the field of view (64x64) is larger than what the program used to identify those thumbnails (just the centroids of single cells at the center of that thumbnail).    \n",
    "  \n",
    "While you are manually sorting the postives and negative thumbnails, please keep in mind that you are looking for high-confident positives and high-confident negatives. It is absolutely okay to delete off majority of the thumbnails that you are not confident about. This infact makes it easy and fast as you are looking to only keep only thumbnails that are readily sortable.  \n",
    "  \n",
    "Lastly, I generally use a whole slide image to generate these thumbnails as there will be enough regions with high expression and no expression of the marker of interest. If you look at the thumbnails of this dummy example, you will notice that most thumbnails of `TrueNeg` for `ECAD` does contain some level of `ECAD` as there is not enough regions to sample from. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-1a (optional)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might have noticed in the above example, I had set `localNorm=True`, which is on by default. This parameter essentially creates a mirror duplicate copy of all the thumbnails and saves it under a folder named `localNorm`. The difference being that each thumbnail is normalized to the maximum intensity pixel in that thumbnail. It helps to visually sort out the true positives and negatives faster and more reliably. As we will not use the thumbnails in the `localNorm` for training the deep learning model, we want to make sure all the manual sorting that we did in the `localNorm` folder is copied over to the real training data. I have written an additional function to help with this. Any moving or deleting of files that you did in the `localNorm` folder will be copied over to the real training data.  \n",
    "  \n",
    "Randomly shift and delete some files from `TruePos` -> `PosToNeg` and `TrueNeg` -> `NegToPos`   for   `CD3D` for the purpose of illustration and run the  `cloneFolder` function to see what happens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing: CD3D\n",
      "Processing: ECAD\n",
      "Cloning Folder is complete, head over to /CSPOT/Thumbnails\" to view results\n"
     ]
    }
   ],
   "source": [
    "# list of folders to copy settings from\n",
    "copyFolder = [projectDir + '/CSPOT/Thumbnails/localNorm/CD3D',\n",
    "              projectDir + '/CSPOT/Thumbnails/localNorm/ECAD']\n",
    "# list of folders to apply setting to\n",
    "applyFolder = [projectDir + '/CSPOT/Thumbnails/CD3D',\n",
    "               projectDir + '/CSPOT/Thumbnails/ECAD']\n",
    "# note: Every copyFolder should have a corresponding applyFolder. The order matters! \n",
    "\n",
    "# The function accepts the four pre-defined folders. If you had renamed them, please change it using the parameter below.\n",
    "cs.cloneFolder (copyFolder, \n",
    "                applyFolder, \n",
    "                TruePos='TruePos', TrueNeg='TrueNeg', \n",
    "                PosToNeg='PosToNeg', NegToPos='NegToPos')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Same function if the user wants to run it via Command Line Interface**\n",
    "```\n",
    "python cloneFolder.py \\\n",
    "            --copyFolder /Users/aj/Desktop/cspotExampleData/CSPOT/Thumbnails/localNorm/CD3D /Users/aj/Desktop/cspotExampleData/CSPOT/Thumbnails/localNorm/ECAD \\\n",
    "            --applyFolder /Users/aj/Desktop/cspotExampleData/CSPOT/Thumbnails/CD3D /Users/aj/Desktop/cspotExampleData/CSPOT/Thumbnails/ECAD\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you head over to the training data thumbails you will notice that the files have been shifited around exactly as in the `localNorm` folder.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-2: Generate Masks for Training Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To train the deep learning model, in addition to the raw thumbnails a mask is needed. The mask lets the model know where the cell is located. Ideally one would manually draw on the thumbnails to locate where the positive cells are, however for the pupose of scalability we will use automated approaches to generate the Mask for us. The following function will generate the mask and split the data into `training, validation and test` that can be directly fed into the deep learning algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing: CD3D\n",
      "Processing: ECAD\n",
      "Training data has been generated, head over to \"/Users/aj/Documents/cspotExampleData/CSPOT/TrainingData\" to view results\n"
     ]
    }
   ],
   "source": [
    "thumbnailFolder = [projectDir + '/CSPOT/Thumbnails/CD3D',\n",
    "                   projectDir + '/CSPOT/Thumbnails/ECAD']\n",
    "\n",
    "# The function accepts the four pre-defined folders. If you had renamed them, please change it using the parameter below.\n",
    "# If you had deleted any of the folders and are not using them, replace the folder name with `None` in the parameter.\n",
    "cs.generateTrainTestSplit ( thumbnailFolder, \n",
    "                            projectDir=projectDir,\n",
    "                            file_extension=None,\n",
    "                            TruePos='TruePos', NegToPos='NegToPos',\n",
    "                            TrueNeg='TrueNeg', PosToNeg='PosToNeg')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Same function if the user wants to run it via Command Line Interface**\n",
    "```\n",
    "python generateTrainTestSplit.py \\\n",
    "            --thumbnailFolder /Users/aj/Desktop/cspotExampleData/CSPOT/Thumbnails/CD3D /Users/aj/Desktop/cspotExampleData/CSPOT/Thumbnails/ECAD \\\n",
    "            --projectDir /Users/aj/Desktop/cspotExampleData/\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you head over to `CSPOT/TrainingData/`, you will notice that each of the supplied marker above will have a folder with the associated `training, validataion and test` data that is required by the deep-learning algorithm to generate the model. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-3: Train the CSPOT Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function trains a deep learning model for each marker in the provided training data. To train the `cspotModel`, simply direct the function to the `TrainingData` folder. To train only specific models, specify the folder names using the `trainMarkers` parameter. The 'outputDir' remains constant and the program will automatically create subfolders to save the trained models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From /Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/keras/src/layers/normalization/batch_normalization.py:883: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Colocations handled automatically by placer.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/UNet.py:137: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).\n",
      "  bn = tf.nn.leaky_relu(tf.layers.batch_normalization(c00+shortcut, training=UNet2D.tfTraining))\n",
      "/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/UNet.py:159: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).\n",
      "  lbn = tf.nn.leaky_relu(tf.layers.batch_normalization(\n",
      "/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/UNet.py:162: UserWarning: `tf.layers.dropout` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dropout` instead.\n",
      "  return tf.layers.dropout(lbn, 0.15, training=UNet2D.tfTraining)\n",
      "/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/UNet.py:224: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).\n",
      "  tf.layers.batch_normalization(tf.nn.conv2d(cc, luXWeights2, strides=[1, 1, 1, 1], padding='SAME'),\n",
      "/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/UNet.py:245: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).\n",
      "  return tf.layers.batch_normalization(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/aj/Documents/cspotExampleData/CSPOT/TrainingData/CD3D/training\n",
      "Training for 8 steps\n",
      "Found 122 training images\n",
      "Found 41 validation images\n",
      "Found 41 test images\n",
      "Of these, 0 are artefact training images\n",
      " and  0 artefact validation images\n",
      "Using 0 and 1 for mean and standard deviation.\n",
      "saving data\n",
      "saving data\n",
      "Using 16.0 and 0.0 for global max and min intensities.\n",
      "Class balance ratio is 18.959115759448537\n",
      "WARNING:tensorflow:From /Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use `tf.cast` instead.\n",
      "WARNING:tensorflow:From /Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use `tf.cast` instead.\n",
      "WARNING:tensorflow:From /Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: calling reduce_min_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "keep_dims is deprecated, use keepdims instead\n",
      "WARNING:tensorflow:From /Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "keep_dims is deprecated, use keepdims instead\n",
      "WARNING:tensorflow:From /Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Deprecated in favor of operator or tf.math.divide.\n",
      "step 00000, e: 0.506249, epoch: 1\n",
      "Model saved in file: /Users/aj/Documents/cspotExampleData/CSPOT/cspotModel/CD3D/model.ckpt\n",
      "step 00001, e: 0.500656, epoch: 1\n",
      "step 00002, e: 0.481774, epoch: 1\n",
      "step 00003, e: 0.466444, epoch: 1\n",
      "step 00004, e: 0.441243, epoch: 1\n",
      "step 00005, e: 0.492766, epoch: 1\n",
      "step 00006, e: 0.522770, epoch: 2\n",
      "step 00007, e: 0.508577, epoch: 2\n",
      "saving data\n",
      "loading data\n",
      "INFO:tensorflow:Restoring parameters from /Users/aj/Documents/cspotExampleData/CSPOT/cspotModel/CD3D/model.ckpt\n",
      "Model restored.\n",
      "/Users/aj/Documents/cspotExampleData/CSPOT/TrainingData/ECAD/training\n",
      "Training for 8 steps\n",
      "Found 120 training images\n",
      "Found 40 validation images\n",
      "Found 40 test images\n",
      "Of these, 0 are artefact training images\n",
      " and  0 artefact validation images\n",
      "Using 0 and 1 for mean and standard deviation.\n",
      "saving data\n",
      "saving data\n",
      "Using 70.0 and 6.0 for global max and min intensities.\n",
      "Class balance ratio is 6.6801200018750295\n",
      "step 00000, e: 0.498890, epoch: 1\n",
      "Model saved in file: /Users/aj/Documents/cspotExampleData/CSPOT/cspotModel/ECAD/model.ckpt\n",
      "step 00001, e: 0.494717, epoch: 1\n",
      "step 00002, e: 0.510676, epoch: 1\n",
      "step 00003, e: 0.480390, epoch: 1\n",
      "step 00004, e: 0.478806, epoch: 1\n",
      "step 00005, e: 0.516272, epoch: 1\n",
      "step 00006, e: 0.512058, epoch: 2\n",
      "step 00007, e: 0.480931, epoch: 2\n",
      "saving data\n",
      "loading data\n",
      "INFO:tensorflow:Restoring parameters from /Users/aj/Documents/cspotExampleData/CSPOT/cspotModel/ECAD/model.ckpt\n",
      "Model restored.\n",
      "CSPOT Models have been generated, head over to \"/Users/aj/Documents/cspotExampleData/CSPOT/cspotModel\" to view results\n"
     ]
    }
   ],
   "source": [
    "trainingDataPath = projectDir + '/CSPOT/TrainingData'\n",
    "\n",
    "cs.csTrain(trainingDataPath=trainingDataPath,\n",
    "               projectDir=projectDir,\n",
    "               trainMarkers=None,\n",
    "               artefactPath=None,\n",
    "               imSize=64,\n",
    "               nChannels=1,\n",
    "               nClasses=2,\n",
    "               nExtraConvs=0,\n",
    "               nLayers=3,\n",
    "               featMapsFact=2,\n",
    "               downSampFact=2,\n",
    "               ks=3,\n",
    "               nOut0=16,\n",
    "               stdDev0=0.03,\n",
    "               batchSize=16,\n",
    "               epochs=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Same function if the user wants to run it via Command Line Interface**\n",
    "```\n",
    "python csTrain.py \\\n",
    "        --trainingDataPath /Users/aj/Documents/cspotExampleData/CSPOT/TrainingData \\\n",
    "        --projectDir /Users/aj/Documents/cspotExampleData/ \\\n",
    "        --epochs 1\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this tutorial ends here. Move to the Run CSPOT Algorithm Tutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "b155a34da96f173027cc83af6ba86a1d662c2b9e09ee27c56baf0fff8044d14a"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
